Adding different types of parallelism to the elementwise layer by AndreySorokin7 · Pull Request #222 · embedded-dev-research/ITLabAI

AndreySorokin7 · 2025-11-04T09:24:41Z

No description provided.

allnes · 2025-11-08T11:46:08Z

  EWLayerImpl() = delete;
  EWLayerImpl(const Shape& shape, std::string function, float alpha = 0.0F,
-              float beta = 0.0F);
+              float beta = 0.0F, int type_parall = 0);


Use a strongly-typed backend enum instead of int for readability and safety.

enum class ParBackend { Seq = 0, Threads = 1, TBB = 2, OMP = 3 };

Propagate ParBackend through API instead of raw int

allnes · 2025-11-08T13:08:39Z

+  int available_threads = -1;
+  if (type_parall_ == 0) available_threads = 1;
+  if (type_parall_ == 1)
+    available_threads = std::thread::hardware_concurrency();
+  if (type_parall_ == 2)
+    available_threads = oneapi::tbb::info::default_concurrency();
+  if (type_parall_ == 3) available_threads = omp_get_max_threads();


Please wrap common function for getting thread number

allnes · 2025-11-09T10:52:50Z

@@ -1,5 +1,11 @@
 #pragma once
+#include <omp.h>


Guard the OpenMP/TBB includes and add ; otherwise non-OpenMP builds fail.

#ifdef HAS_OPENMP #include <omp.h> #endif #include <thread> #ifdef HAS_TBB #include <oneapi/tbb/blocked_range.h> #include <oneapi/tbb/parallel_for.h> #include <oneapi/tbb/info.h> #endif

allnes · 2025-11-09T10:57:25Z

  EWLayerImpl() = delete;
  EWLayerImpl(const Shape& shape, std::string function, float alpha = 0.0F,
-              float beta = 0.0F);
+              float beta = 0.0F, int type_parall = 0);


Propagate ParBackend through API instead of raw int

allnes · 2025-11-09T11:02:59Z

 };

+template <typename Func>
+inline void parallel_for(int count, Func func, int mode = 0) {


Suggested change

inline void parallel_for(int count, Func func, int mode = 0) {

inline void parallel_for(int count, Func func, int mode = 0) {

if (count <= 0) return;

allnes · 2025-11-09T11:10:54Z

@@ -1,5 +1,11 @@
 #pragma once
+#include <omp.h>


Move all backend headers (OpenMP/TBB/Threads) and implementation details into a small parallel module. Expose a single, inline header API so call sites incur no extra call/indirection.

include/parallel/parallel.hpp (inline API)

include/parallel/backends.hpp (backend helpers; guarded includes)

No <omp.h>/TBB headers leaking into layer headers.

Example:

// include/parallel/parallel.hpp #pragma once #include <cstddef> enum class ParBackend { Auto, Seq, Threads, TBB, OMP }; struct ParOptions { ParBackend backend = ParBackend::Auto; int max_threads = 0; // 0 = runtime default std::size_t min_parallel_n = 4096; // small tasks stay sequential std::size_t grain = 1024; // backend-specific chunk hint }; // Header-only: one branch + inlined backend template <class F> inline void parallel_for(std::size_t n, F&& f, const ParOptions& opt) { if (n == 0) return; const ParBackend b = select_backend(opt, n); // inline, cheap switch (b) { case ParBackend::Seq: return impl_seq(n, f); case ParBackend::Threads: return impl_threads(n, f, opt); case ParBackend::TBB: return impl_tbb(n, f, opt); case ParBackend::OMP: return impl_omp(n, f, opt); case ParBackend::Auto: return impl_seq(n, f); // unreachable } }

but not add auto

allnes · 2025-11-09T11:12:22Z

@@ -1,5 +1,11 @@
 #pragma once
+#include <omp.h>
+


Avoid re-evaluating “Auto” logic every call. Resolve once (feature flags + environment + problem size) and cache in the layer/context.

// Called once per layer or first use inline ParBackend resolve_auto_once(const ParOptions& opt, std::size_t n) noexcept { #if defined(HAS_OMP) if (n >= opt.min_parallel_n) return ParBackend::OMP; #elif defined(HAS_TBB) if (n >= opt.min_parallel_n) return ParBackend::TBB; #elif defined(HAS_THREADS) if (n >= opt.min_parallel_n) return ParBackend::Threads; #endif return ParBackend::Seq; } inline ParBackend select_backend(const ParOptions& opt, std::size_t n) noexcept { if (opt.backend != ParBackend::Auto) return opt.backend; static ParBackend cached = resolve_auto_once(opt, n); // or store in the layer return cached; }

aobolensk

I actually think we can leave remaining solution basically as is. The problem with OpenMP slowdown is actually reproducible, but I suggest to focus on parallel_for itself. Anyway, this effect is not that visible on matrix multiplication workloads. For further investigation we will take a look at the compilation details (which code it has been lowered to). For now we can proceed as is

codecov · 2025-12-01T08:49:35Z

Codecov Report

❌ Patch coverage is 88.00000% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.18%. Comparing base (752c273) to head (d9f3d13).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
include/parallel/backends.hpp	80.00%	3 Missing and 5 partials ⚠️
include/parallel/parallel.hpp	85.00%	0 Missing and 3 partials ⚠️
include/layers/EWLayer.hpp	96.87%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #222      +/-   ##
==========================================
+ Coverage   83.09%   83.18%   +0.09%     
==========================================
  Files          44       46       +2     
  Lines        2271     2349      +78     
  Branches     1349     1397      +48     
==========================================
+ Hits         1887     1954      +67     
- Misses        187      190       +3     
- Partials      197      205       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

AndreySorokin7 · 2025-12-01T11:13:53Z

aobolensk · 2025-12-02T04:51:40Z

-if (NOT WIN32)
-    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall -Wextra -Werror")
-    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -Werror")
+# if (NOT WIN32)


Please, revert. Flags are still required

70c83f1

I tried to uncomment them, but then I get errors

aobolensk · 2025-12-02T04:51:52Z

+  end = std::chrono::high_resolution_clock::now();
+  total_duration =
+      std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
+  std::cout << "TBB notmatrix: " << total_duration.count() << " ms"


Please, clean up prints

allnes · 2025-12-05T11:16:54Z

Proposed fix for the Linux OpenMP failure:

if(NOT WIN32)
    find_package(OpenMP)
endif()

if(OpenMP_FOUND)
    message(STATUS "OpenMP found - enabling parallel support")
    add_compile_definitions(HAS_OPENMP)
    link_libraries(OpenMP::OpenMP_CXX)
else()
    message(STATUS "OpenMP not found - parallel features disabled")
endif()

Notes:

Drop the forced CMAKE_BUILD_TYPE=Release so CI Debug matrix keeps working.
Linking OpenMP::OpenMP_CXX at the top level propagates -fopenmp to all targets, eliminating the #pragma omp unknown-pragmas error under -Werror.

1

f14195d

AndreySorokin7 requested review from allnes and aobolensk as code owners November 4, 2025 09:24

AndreySorokin7 and others added 6 commits November 5, 2025 19:17

fix

b19191f

Merge branch 'main' into AndreySorokin7/Add_parall_ew_layer

9efc295

fix

30a33ff

fix

4a3d16e

fix

406387d

fix

6f82796

allnes reviewed Nov 8, 2025

View reviewed changes

allnes reviewed Nov 9, 2025

View reviewed changes

aobolensk approved these changes Nov 9, 2025

View reviewed changes

AndreySorokin7 added 15 commits November 12, 2025 16:24

fix

3cb8263

fix

2d369b6

fix

0412b6a

fix

5521152

fix

4293356

fix

0ba7e1b

fix

f5f0f14

fix

8222f23

fix

0bb0d02

fix

66b3b93

fix

04c3815

fix

7430245

fix

56c6d89

fix

46bfe0b

fix

16e39bf

AndreySorokin7 and others added 12 commits November 20, 2025 15:04

fix

a9dc8c3

fix

02b39ab

fix

5f921f1

fix

4a8c2f5

fix

a453ed9

fix

40a343b

fix

e1a1825

fix

15ee554

fix

a19776c

fix

f15530c

Merge branch 'main' into AndreySorokin7/Add_parall_ew_layer

14ab537

Update test_ewlayer.cpp

2232cd6

AndreySorokin7 requested a review from allnes December 1, 2025 09:26

AndreySorokin7 added 3 commits December 1, 2025 13:59

Update backends.hpp

e5c56ac

Update backends.hpp

2a3bd2a

Update backends.hpp

971aac0

aobolensk reviewed Dec 2, 2025

View reviewed changes

AndreySorokin7 added 2 commits December 2, 2025 11:46

fix

70c83f1

fix

ba9ea84

allnes approved these changes Dec 3, 2025

View reviewed changes

AndreySorokin7 added 5 commits December 3, 2025 18:57

fix

23db74a

fix

c656a85

fix

8e1c6f1

fix

0be51fa

fix

7f88c7a

link_libraries

d9f3d13

aobolensk merged commit c115709 into main Dec 9, 2025
23 checks passed

	inline void parallel_for(int count, Func func, int mode = 0) {
	inline void parallel_for(int count, Func func, int mode = 0) {
	if (count <= 0) return;

Uh oh!

Conversation

AndreySorokin7 commented Nov 4, 2025

Uh oh!

allnes Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allnes Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aobolensk left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

AndreySorokin7 commented Dec 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allnes commented Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

allnes Nov 8, 2025 •

edited

Loading

allnes Nov 9, 2025 •

edited

Loading

codecov Bot commented Dec 1, 2025 •

edited

Loading